In [1]:
# Este notebook se trabajó a parte debido a que ambas librerías (dataprep y ppscore) trabajan únicamente con una versión
# inferior a la pandas que estoy utilizando en el notebook principal (2.0.1)
from dataprep.eda import create_report
import pandas as pd
import warnings

import plotly.express as px

# La PPS es una puntuación asimétrica, agnóstica del tipo de datos, que puede detectar relaciones 
# lineales o no lineales entre dos columnas. La puntuación oscila entre 0 (ningún poder predictivo) 
# y 1 (poder predictivo perfecto). Puede utilizarse como alternativa a la correlación (matriz).
import ppscore as pps

warnings.filterwarnings("ignore")
In [2]:
# Se utilizará el dataset con Subsampling y Oversampling para apreciar mejor el PPS
df = pd.read_parquet('data/sampled_data.parquet') 

Predictive Power Score

In [3]:
%%time
data_pps = pps.matrix(df)[['x', 'y', 'ppscore']]
CPU times: total: 20.6 s
Wall time: 40.5 s
In [4]:
data_pps = data_pps.pivot(columns='x', index='y', values='ppscore')
In [5]:
fig = px.imshow(data_pps.round(1), text_auto=True, aspect="auto", color_continuous_scale=px.colors.sequential.Blues)
fig.layout.height = 700
fig.layout.width = 1120
fig.update_coloraxes(showscale=False)
fig.update_layout(
    title_text="Predictive Power Score Heatmap")
fig.show()

💡 Interpretación:

  • La puntuación oscila siempre entre 0 y 1 y es independiente del tipo de datos.
  • Una puntuación de 0 significa que la columna x no puede predecir la columna y mejor que un modelo base.
  • Una puntuación de 1 significa que la columna x puede predecir perfectamente la columna y dado el modelo.

Del gráfico:

  • Las variables 12, 13, 15, 17, 18, 19 y 25 tienen un poder predictivo mayor a .5 sobre la variable is_fraud.
  • Esto no es igual cuando se mira por el lado contrario, es decir, la variable is_fraud NO tiene un poder predictivo mayor a .5 sobre estas variables.

El PPS se puede utilizar como referencia al momento de elegir variables y descartarlas. Al contrario de la correlación, cuyos datos son simétricos tanto debajo como arriba de la línea diagonal, es decir, corr(v1, v2) es la misma que corr(v2, v1), esto no es así con el PPS, lo cual da un mayor poder de análisis.

Referencias:
Repositorio
Descripción General

Análisis Exploratorio de Datos Automatizado con dataprep

In [6]:
%%time
create_report(df)
  0%|          | 0/7102 [00:00<?, ?it/s]
CPU times: total: 6.78 s
Wall time: 15.7 s
Out[6]:
DataPrep Report
DataPrep Report Overview
Variables ≡
scaled_amount scaled_timestamp variable_01 variable_02 variable_03 variable_04 variable_05 variable_06 variable_07 variable_08 variable_09 variable_10 variable_11 variable_12 variable_13 variable_14 variable_15 variable_16 variable_17 variable_18 variable_19 variable_20 variable_21 variable_22 variable_23 variable_24 variable_25 variable_26 variable_27 variable_28 is_fraud
Interactions Correlations Missing Values

Overview

Dataset Statistics

Number of Variables 31
Number of Rows 7095
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 1.7 MB
Average Row Size in Memory 248.0 B
Variable Types
  • Numerical: 30
  • Categorical: 1

Dataset Insights

variable_04 and variable_05 have similar distributions Similar Distribution
variable_13 and variable_22 have similar distributions Similar Distribution
variable_18 and variable_25 have similar distributions Similar Distribution
variable_18 and variable_27 have similar distributions Similar Distribution
scaled_amount is skewed Skewed
variable_01 is skewed Skewed
variable_02 is skewed Skewed
variable_06 is skewed Skewed
variable_07 is skewed Skewed
variable_08 is skewed Skewed
variable_09 is skewed Skewed
variable_11 is skewed Skewed
variable_12 is skewed Skewed
variable_13 is skewed Skewed
variable_15 is skewed Skewed
variable_17 is skewed Skewed
variable_19 is skewed Skewed
variable_20 is skewed Skewed
variable_21 is skewed Skewed
variable_22 is skewed Skewed
variable_23 is skewed Skewed
variable_24 is skewed Skewed
variable_26 is skewed Skewed
variable_27 is skewed Skewed
variable_28 is skewed Skewed
is_fraud has constant length 3 Constant Length
scaled_amount has 3714 (52.35%) negatives Negatives
scaled_timestamp has 3724 (52.49%) negatives Negatives
variable_01 has 2761 (38.91%) negatives Negatives
variable_02 has 2963 (41.76%) negatives Negatives
variable_03 has 3645 (51.37%) negatives Negatives
variable_04 has 3388 (47.75%) negatives Negatives
variable_05 has 3459 (48.75%) negatives Negatives
variable_06 has 3851 (54.28%) negatives Negatives
variable_07 has 3465 (48.84%) negatives Negatives
variable_08 has 3134 (44.17%) negatives Negatives
variable_09 has 3609 (50.87%) negatives Negatives
variable_10 has 3168 (44.65%) negatives Negatives
variable_11 has 4127 (58.17%) negatives Negatives
variable_12 has 4448 (62.69%) negatives Negatives
variable_13 has 4114 (57.98%) negatives Negatives
variable_14 has 3394 (47.84%) negatives Negatives
variable_15 has 4491 (63.3%) negatives Negatives
variable_16 has 3699 (52.14%) negatives Negatives
variable_17 has 4285 (60.39%) negatives Negatives
variable_18 has 2579 (36.35%) negatives Negatives
variable_19 has 4849 (68.34%) negatives Negatives
variable_20 has 4585 (64.62%) negatives Negatives
variable_21 has 2913 (41.06%) negatives Negatives
variable_22 has 4186 (59.0%) negatives Negatives
variable_23 has 4979 (70.18%) negatives Negatives
variable_24 has 4146 (58.44%) negatives Negatives
variable_25 has 2438 (34.36%) negatives Negatives
variable_26 has 4340 (61.17%) negatives Negatives
variable_27 has 2441 (34.4%) negatives Negatives
variable_28 has 4180 (58.91%) negatives Negatives
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

Variables


scaled_amount

numerical

Approximate Distinct Count 3908
Approximate Unique (%) 55.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean 0.9796
Minimum -0.3059
Maximum 105.1498
Zeros 7
Zeros (%) 0.1%
Negatives 3714
Negatives (%) 52.3%
  • scaled_amount is skewed right (γ1 = 9.0693)

Quantile Statistics

Minimum -0.3059
5-th Percentile -0.2952
Q1 -0.2649
Median -0.03769
Q3 0.9178
95-th Percentile 5.6323
Maximum 105.1498
Range 105.4557
IQR 1.1827

Descriptive Statistics

Mean 0.9796
Standard Deviation 3.1495
Variance 9.9192
Sum 6950.521
Skewness 9.0693
Kurtosis 192.6943
Coefficient of Variation 3.2149
  • scaled_amount is not normally distributed (p-value 6.575640771685192e-25)
  • scaled_amount has 770 outliers

scaled_timestamp

numerical

Approximate Distinct Count 6991
Approximate Unique (%) 98.5%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean 0.05978
Minimum -0.9945
Maximum 1.0348
Zeros 0
Zeros (%) 0.0%
Negatives 3724
Negatives (%) 52.5%
  • scaled_timestamp is skewed right (γ1 = 0.075)

Quantile Statistics

Minimum -0.9945
5-th Percentile -0.7839
Q1 -0.4126
Median -0.04546
Q3 0.5968
95-th Percentile 0.9077
Maximum 1.0348
Range 2.0293
IQR 1.0094

Descriptive Statistics

Mean 0.05978
Standard Deviation 0.5576
Variance 0.3109
Sum 424.1193
Skewness 0.07501
Kurtosis -1.257
Coefficient of Variation 9.3271

variable_01

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean 0.02487
Minimum -4.7497
Maximum 4.8505
Zeros 0
Zeros (%) 0.0%
Negatives 2761
Negatives (%) 38.9%
  • variable_01 is skewed left (γ1 = -1.1425)

Quantile Statistics

Minimum -4.7497
5-th Percentile -0.5558
Q1 -0.05321
Median 0.0251
Q3 0.1475
95-th Percentile 0.4954
Maximum 4.8505
Range 9.6002
IQR 0.2007

Descriptive Statistics

Mean 0.02487
Standard Deviation 0.3606
Variance 0.13
Sum 176.4843
Skewness -1.1425
Kurtosis 17.4415
Coefficient of Variation 14.4965
  • variable_01 is not normally distributed (p-value 1.1999244881392228e-19)
  • variable_01 has 984 outliers

variable_02

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean 0.07501
Minimum -7.2635
Maximum 4.6277
Zeros 0
Zeros (%) 0.0%
Negatives 2963
Negatives (%) 41.8%
  • variable_02 is skewed left (γ1 = -2.187)

Quantile Statistics

Minimum -7.2635
5-th Percentile -0.8497
Q1 -0.06539
Median 0.02819
Q3 0.2726
95-th Percentile 1.2107
Maximum 4.6277
Range 11.8911
IQR 0.338

Descriptive Statistics

Mean 0.07501
Standard Deviation 0.7562
Variance 0.5718
Sum 532.2154
Skewness -2.187
Kurtosis 17.0281
Coefficient of Variation 10.0808
  • variable_02 is not normally distributed (p-value 2.552002738974818e-22)
  • variable_02 has 1098 outliers

variable_03

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean 0.01285
Minimum -1.3989
Maximum 2.9521
Zeros 0
Zeros (%) 0.0%
Negatives 3645
Negatives (%) 51.4%
  • variable_03 is skewed right (γ1 = 0.5744)

Quantile Statistics

Minimum -1.3989
5-th Percentile -0.6599
Q1 -0.2982
Median -0.0222
Q3 0.2789
95-th Percentile 0.8213
Maximum 2.9521
Range 4.351
IQR 0.5771

Descriptive Statistics

Mean 0.01285
Standard Deviation 0.4593
Variance 0.211
Sum 91.1648
Skewness 0.5744
Kurtosis 1.4508
Coefficient of Variation 35.745
  • variable_03 is not normally distributed (p-value 0.0014395448340091754)
  • variable_03 has 78 outliers

variable_04

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean 0.01821
Minimum -4.7816
Maximum 2.7446
Zeros 0
Zeros (%) 0.0%
Negatives 3388
Negatives (%) 47.8%
  • variable_04 is skewed left (γ1 = -0.5527)

Quantile Statistics

Minimum -4.7816
5-th Percentile -0.9163
Q1 -0.3051
Median 0.03171
Q3 0.3707
95-th Percentile 0.8702
Maximum 2.7446
Range 7.5262
IQR 0.6759

Descriptive Statistics

Mean 0.01821
Standard Deviation 0.5991
Variance 0.3589
Sum 129.2338
Skewness -0.5527
Kurtosis 4.7394
Coefficient of Variation 32.8885
  • variable_04 is not normally distributed (p-value 7.087961039050091e-06)
  • variable_04 has 310 outliers

variable_05

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean -0.0325
Minimum -2.8149
Maximum 3.9445
Zeros 0
Zeros (%) 0.0%
Negatives 3459
Negatives (%) 48.8%
  • variable_05 is skewed left (γ1 = -0.3857)

Quantile Statistics

Minimum -2.8149
5-th Percentile -1.0748
Q1 -0.3842
Median 0.01131
Q3 0.3929
95-th Percentile 0.77
Maximum 3.9445
Range 6.7594
IQR 0.7771

Descriptive Statistics

Mean -0.0325
Standard Deviation 0.5759
Variance 0.3317
Sum -230.6034
Skewness -0.3857
Kurtosis 1.0407
Coefficient of Variation -17.7196
  • variable_05 is not normally distributed (p-value 4.756665055516031e-05)
  • variable_05 has 88 outliers

variable_06

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean -0.02707
Minimum -19.2543
Maximum 11.5729
Zeros 0
Zeros (%) 0.0%
Negatives 3851
Negatives (%) 54.3%
  • variable_06 is skewed left (γ1 = -8.037)

Quantile Statistics

Minimum -19.2543
5-th Percentile -0.724
Q1 -0.1936
Median -0.02663
Q3 0.1633
95-th Percentile 0.8138
Maximum 11.5729
Range 30.8273
IQR 0.3569

Descriptive Statistics

Mean -0.02707
Standard Deviation 0.9322
Variance 0.869
Sum -192.0288
Skewness -8.037
Kurtosis 169.6643
Coefficient of Variation -34.4416
  • variable_06 is not normally distributed (p-value 1.7605860656942484e-21)
  • variable_06 has 762 outliers

variable_07

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean 0.02709
Minimum -8.887
Maximum 8.362
Zeros 0
Zeros (%) 0.0%
Negatives 3465
Negatives (%) 48.8%
  • variable_07 is skewed right (γ1 = 0.74)

Quantile Statistics

Minimum -8.887
5-th Percentile -1.1229
Q1 -0.5218
Median 0.0234
Q3 0.5447
95-th Percentile 1.1818
Maximum 8.362
Range 17.249
IQR 1.0664

Descriptive Statistics

Mean 0.02709
Standard Deviation 0.8306
Variance 0.6899
Sum 192.176
Skewness 0.74
Kurtosis 12.6245
Coefficient of Variation 30.6653
  • variable_07 is not normally distributed (p-value 8.403651692097962e-09)
  • variable_07 has 99 outliers

variable_08

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean 0.1355
Minimum -22.7976
Maximum 27.2028
Zeros 0
Zeros (%) 0.0%
Negatives 3134
Negatives (%) 44.2%
  • variable_08 is skewed left (γ1 = -1.6015)

Quantile Statistics

Minimum -22.7976
5-th Percentile -0.7163
Q1 -0.1948
Median 0.05933
Q3 0.3995
95-th Percentile 1.7913
Maximum 27.2028
Range 50.0004
IQR 0.5943

Descriptive Statistics

Mean 0.1355
Standard Deviation 1.4288
Variance 2.0416
Sum 961.1055
Skewness -1.6015
Kurtosis 93.4357
Coefficient of Variation 10.5479
  • variable_08 is not normally distributed (p-value 4.467522021739658e-21)
  • variable_08 has 822 outliers

variable_09

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean 0.1289
Minimum -10.201
Maximum 11.059
Zeros 0
Zeros (%) 0.0%
Negatives 3609
Negatives (%) 50.9%
  • variable_09 is skewed right (γ1 = 1.6198)

Quantile Statistics

Minimum -10.201
5-th Percentile -0.6314
Q1 -0.194
Median -0.00659
Q3 0.3119
95-th Percentile 1.4693
Maximum 11.059
Range 21.26
IQR 0.5059

Descriptive Statistics

Mean 0.1289
Standard Deviation 0.842
Variance 0.709
Sum 914.3418
Skewness 1.6198
Kurtosis 31.6693
Coefficient of Variation 6.5339
  • variable_09 is not normally distributed (p-value 1.7197957428391583e-18)
  • variable_09 has 780 outliers

variable_10

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean 0.2091
Minimum -3.8345
Maximum 5.2283
Zeros 0
Zeros (%) 0.0%
Negatives 3168
Negatives (%) 44.6%
  • variable_10 is skewed right (γ1 = 0.5531)

Quantile Statistics

Minimum -3.8345
5-th Percentile -1.5021
Q1 -0.4438
Median 0.111
Q3 0.7381
95-th Percentile 2.4208
Maximum 5.2283
Range 9.0629
IQR 1.1819

Descriptive Statistics

Mean 0.2091
Standard Deviation 1.1284
Variance 1.2733
Sum 1483.6352
Skewness 0.5531
Kurtosis 1.2361
Coefficient of Variation 5.3963
  • variable_10 is not normally distributed (p-value 0.00029778141634744957)
  • variable_10 has 385 outliers

variable_11

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean -0.7144
Minimum -9.4987
Maximum 3.7903
Zeros 0
Zeros (%) 0.0%
Negatives 4127
Negatives (%) 58.2%
  • variable_11 is skewed left (γ1 = -1.9048)

Quantile Statistics

Minimum -9.4987
5-th Percentile -5.0715
Q1 -0.9913
Median -0.1805
Q3 0.4031
95-th Percentile 1.4236
Maximum 3.7903
Range 13.2891
IQR 1.3945

Descriptive Statistics

Mean -0.7144
Standard Deviation 2.0501
Variance 4.2029
Sum -5068.3928
Skewness -1.9048
Kurtosis 3.9043
Coefficient of Variation -2.8698
  • variable_11 is not normally distributed (p-value 5.402403668388217e-07)
  • variable_11 has 838 outliers

variable_12

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean -2.1391
Minimum -25.1628
Maximum 6.7394
Zeros 0
Zeros (%) 0.0%
Negatives 4448
Negatives (%) 62.7%
  • variable_12 is skewed left (γ1 = -2.1918)

Quantile Statistics

Minimum -25.1628
5-th Percentile -13.9342
Q1 -1.3707
Median -0.2889
Q3 0.333
95-th Percentile 1.5041
Maximum 6.7394
Range 31.9022
IQR 1.7037

Descriptive Statistics

Mean -2.1391
Standard Deviation 5.0636
Variance 25.6402
Sum -15177.0698
Skewness -2.1918
Kurtosis 4.5115
Coefficient of Variation -2.3671
  • variable_12 is not normally distributed (p-value 1.3563209558788549e-15)
  • variable_12 has 1571 outliers

variable_13

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean -1.3208
Minimum -14.1299
Maximum 4.449
Zeros 0
Zeros (%) 0.0%
Negatives 4114
Negatives (%) 58.0%
  • variable_13 is skewed left (γ1 = -1.9545)

Quantile Statistics

Minimum -14.1299
5-th Percentile -8.1407
Q1 -1.8433
Median -0.2235
Q3 0.3988
95-th Percentile 1.3709
Maximum 4.449
Range 18.5788
IQR 2.2421

Descriptive Statistics

Mean -1.3208
Standard Deviation 3.0001
Variance 9.0004
Sum -9371.2246
Skewness -1.9545
Kurtosis 3.5077
Coefficient of Variation -2.2714
  • variable_13 is not normally distributed (p-value 1.2262729048724355e-10)
  • variable_13 has 797 outliers

variable_14

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean -0.005751
Minimum -4.4989
Maximum 3.3111
Zeros 0
Zeros (%) 0.0%
Negatives 3394
Negatives (%) 47.8%
  • variable_14 is skewed left (γ1 = -0.4555)

Quantile Statistics

Minimum -4.4989
5-th Percentile -1.6485
Q1 -0.5689
Median 0.04683
Q3 0.6605
95-th Percentile 1.3491
Maximum 3.3111
Range 7.81
IQR 1.2294

Descriptive Statistics

Mean -0.005751
Standard Deviation 0.9369
Variance 0.8779
Sum -40.8041
Skewness -0.4555
Kurtosis 0.5267
Coefficient of Variation -162.9148
  • variable_14 has 104 outliers

variable_15

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean -2.3056
Minimum -19.2143
Maximum 6.9304
Zeros 0
Zeros (%) 0.0%
Negatives 4491
Negatives (%) 63.3%
  • variable_15 is skewed left (γ1 = -1.5768)

Quantile Statistics

Minimum -19.2143
5-th Percentile -11.4862
Q1 -4.3384
Median -0.3628
Q3 0.2889
95-th Percentile 1.1755
Maximum 6.9304
Range 26.1447
IQR 4.6273

Descriptive Statistics

Mean -2.3056
Standard Deviation 4.1441
Variance 17.1733
Sum -16358.5363
Skewness -1.5768
Kurtosis 1.7359
Coefficient of Variation -1.7974
  • variable_15 is not normally distributed (p-value 6.154457201416966e-14)
  • variable_15 has 373 outliers

variable_16

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean -0.05671
Minimum -3.4869
Maximum 3.7796
Zeros 0
Zeros (%) 0.0%
Negatives 3699
Negatives (%) 52.1%
  • variable_16 is skewed right (γ1 = 0.0483)

Quantile Statistics

Minimum -3.4869
5-th Percentile -1.6775
Q1 -0.7612
Median -0.05304
Q3 0.639
95-th Percentile 1.5807
Maximum 3.7796
Range 7.2665
IQR 1.4003

Descriptive Statistics

Mean -0.05671
Standard Deviation 1.0101
Variance 1.0202
Sum -402.3724
Skewness 0.04827
Kurtosis -0.0452
Coefficient of Variation -17.8102
  • variable_16 is not normally distributed (p-value 0.0004876885976066269)
  • variable_16 has 49 outliers

variable_17

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean -2.0433
Minimum -18.6837
Maximum 4.0635
Zeros 0
Zeros (%) 0.0%
Negatives 4285
Negatives (%) 60.4%
  • variable_17 is skewed left (γ1 = -1.9245)

Quantile Statistics

Minimum -18.6837
5-th Percentile -11.3373
Q1 -3.0013
Median -0.2904
Q3 0.3735
95-th Percentile 1.0945
Maximum 4.0635
Range 22.7472
IQR 3.3747

Descriptive Statistics

Mean -2.0433
Standard Deviation 4.0112
Variance 16.0894
Sum -14497.4229
Skewness -1.9245
Kurtosis 3.247
Coefficient of Variation -1.9631
  • variable_17 is not normally distributed (p-value 4.621362979772345e-12)
  • variable_17 has 675 outliers

variable_18

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean 1.2399
Minimum -3.0739
Maximum 12.0189
Zeros 0
Zeros (%) 0.0%
Negatives 2579
Negatives (%) 36.4%
  • variable_18 is skewed right (γ1 = 1.5134)

Quantile Statistics

Minimum -3.0739
5-th Percentile -1.4536
Q1 -0.4527
Median 0.5434
Q3 2.13
95-th Percentile 6.3942
Maximum 12.0189
Range 15.0928
IQR 2.5827

Descriptive Statistics

Mean 1.2399
Standard Deviation 2.4863
Variance 6.1814
Sum 8796.8399
Skewness 1.5134
Kurtosis 2.4288
Coefficient of Variation 2.0053
  • variable_18 is not normally distributed (p-value 0.0016643076972910795)
  • variable_18 has 421 outliers

variable_19

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean -1.828
Minimum -24.5883
Maximum 13.5602
Zeros 0
Zeros (%) 0.0%
Negatives 4849
Negatives (%) 68.3%
  • variable_19 is skewed left (γ1 = -2.1848)

Quantile Statistics

Minimum -24.5883
5-th Percentile -11.5627
Q1 -2.5155
Median -0.4284
Q3 0.1611
95-th Percentile 1.3913
Maximum 13.5602
Range 38.1484
IQR 2.6767

Descriptive Statistics

Mean -1.828
Standard Deviation 3.8108
Variance 14.5221
Sum -12970.0015
Skewness -2.1848
Kurtosis 5.6317
Coefficient of Variation -2.0846
  • variable_19 is not normally distributed (p-value 1.7621908916579334e-15)
  • variable_19 has 714 outliers

variable_20

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean -0.8481
Minimum -13.4341
Maximum 8.8886
Zeros 0
Zeros (%) 0.0%
Negatives 4585
Negatives (%) 64.6%
  • variable_20 is skewed left (γ1 = -1.5038)

Quantile Statistics

Minimum -13.4341
5-th Percentile -4.8964
Q1 -1.5362
Median -0.4123
Q3 0.3406
95-th Percentile 1.5682
Maximum 8.8886
Range 22.3227
IQR 1.8768

Descriptive Statistics

Mean -0.8481
Standard Deviation 2.0225
Variance 4.0904
Sum -6017.5632
Skewness -1.5038
Kurtosis 3.948
Coefficient of Variation -2.3846
  • variable_20 is not normally distributed (p-value 7.844487778454791e-09)
  • variable_20 has 501 outliers

variable_21

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean 0.3716
Minimum -41.0443
Maximum 20.0072
Zeros 0
Zeros (%) 0.0%
Negatives 2913
Negatives (%) 41.1%
  • variable_21 is skewed left (γ1 = -1.2964)

Quantile Statistics

Minimum -41.0443
5-th Percentile -1.468
Q1 -0.1905
Median 0.1059
Q3 0.6471
95-th Percentile 3.4446
Maximum 20.0072
Range 61.0515
IQR 0.8375

Descriptive Statistics

Mean 0.3716
Standard Deviation 2.9653
Variance 8.7929
Sum 2636.213
Skewness -1.2964
Kurtosis 44.9274
Coefficient of Variation 7.9807
  • variable_21 is not normally distributed (p-value 8.680776886744079e-23)
  • variable_21 has 933 outliers

variable_22

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean -1.6986
Minimum -43.5572
Maximum 30.8977
Zeros 0
Zeros (%) 0.0%
Negatives 4186
Negatives (%) 59.0%
  • variable_22 is skewed left (γ1 = -3.2779)

Quantile Statistics

Minimum -43.5572
5-th Percentile -12.5753
Q1 -1.5376
Median -0.2793
Q3 0.3917
95-th Percentile 1.3666
Maximum 30.8977
Range 74.4549
IQR 1.9293

Descriptive Statistics

Mean -1.6986
Standard Deviation 4.6296
Variance 21.4336
Sum -12051.2648
Skewness -3.2779
Kurtosis 14.4265
Coefficient of Variation -2.7256
  • variable_22 is not normally distributed (p-value 6.194555385326872e-20)
  • variable_22 has 907 outliers

variable_23

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean -0.4777
Minimum -10.9088
Maximum 17.0199
Zeros 0
Zeros (%) 0.0%
Negatives 4979
Negatives (%) 70.2%
  • variable_23 is skewed right (γ1 = 0.4851)

Quantile Statistics

Minimum -10.9088
5-th Percentile -3.0199
Q1 -1.1867
Median -0.523
Q3 0.1698
95-th Percentile 2.5868
Maximum 17.0199
Range 27.9287
IQR 1.3566

Descriptive Statistics

Mean -0.4777
Standard Deviation 1.566
Variance 2.4524
Sum -3389.5486
Skewness 0.4851
Kurtosis 4.309
Coefficient of Variation -3.278
  • variable_23 is not normally distributed (p-value 7.029626385234411e-13)
  • variable_23 has 677 outliers

variable_24

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean -0.9434
Minimum -28.3638
Maximum 16.2381
Zeros 0
Zeros (%) 0.0%
Negatives 4146
Negatives (%) 58.4%
  • variable_24 is skewed left (γ1 = -2.9061)

Quantile Statistics

Minimum -28.3638
5-th Percentile -8.176
Q1 -1.1874
Median -0.261
Q3 0.5668
95-th Percentile 2.1358
Maximum 16.2381
Range 44.6018
IQR 1.7542

Descriptive Statistics

Mean -0.9434
Standard Deviation 3.4231
Variance 11.7177
Sum -6693.5748
Skewness -2.9061
Kurtosis 11.5542
Coefficient of Variation -3.6284
  • variable_24 is not normally distributed (p-value 1.1424042761789952e-14)
  • variable_24 has 780 outliers

variable_25

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean 1.4802
Minimum -5.0241
Maximum 12.1147
Zeros 0
Zeros (%) 0.0%
Negatives 2438
Negatives (%) 34.4%
  • variable_25 is skewed right (γ1 = 1.237)

Quantile Statistics

Minimum -5.0241
5-th Percentile -1.9212
Q1 -0.4955
Median 0.6123
Q3 2.7618
95-th Percentile 7.6019
Maximum 12.1147
Range 17.1388
IQR 3.2573

Descriptive Statistics

Mean 1.4802
Standard Deviation 2.922
Variance 8.5382
Sum 10501.8859
Skewness 1.237
Kurtosis 1.3892
Coefficient of Variation 1.9741
  • variable_25 is not normally distributed (p-value 3.2751007221297106e-05)
  • variable_25 has 347 outliers

variable_26

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean -2.1922
Minimum -31.1037
Maximum 3.9345
Zeros 0
Zeros (%) 0.0%
Negatives 4340
Negatives (%) 61.2%
  • variable_26 is skewed left (γ1 = -2.7907)

Quantile Statistics

Minimum -31.1037
5-th Percentile -13.328
Q1 -2.6511
Median -0.5881
Q3 0.6548
95-th Percentile 1.8757
Maximum 3.9345
Range 35.0382
IQR 3.3059

Descriptive Statistics

Mean -2.1922
Standard Deviation 5.1586
Variance 26.6108
Sum -15553.8558
Skewness -2.7907
Kurtosis 8.8935
Coefficient of Variation -2.3531
  • variable_26 is not normally distributed (p-value 8.733068522513833e-09)
  • variable_26 has 628 outliers

variable_27

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean 1.1371
Minimum -16.4809
Maximum 22.0577
Zeros 0
Zeros (%) 0.0%
Negatives 2441
Negatives (%) 34.4%
  • variable_27 is skewed right (γ1 = 1.9285)

Quantile Statistics

Minimum -16.4809
5-th Percentile -1.7543
Q1 -0.3264
Median 0.5574
Q3 1.6964
95-th Percentile 6.9163
Maximum 22.0577
Range 38.5386
IQR 2.0228

Descriptive Statistics

Mean 1.1371
Standard Deviation 3.0146
Variance 9.0877
Sum 8067.4478
Skewness 1.9285
Kurtosis 8.6289
Coefficient of Variation 2.6512
  • variable_27 is not normally distributed (p-value 5.805903487957552e-13)
  • variable_27 has 741 outliers

variable_28

numerical

Approximate Distinct Count 7081
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 113520
Mean -1.4178
Minimum -30.5524
Maximum 2.3521
Zeros 0
Zeros (%) 0.0%
Negatives 4180
Negatives (%) 58.9%
  • variable_28 is skewed left (γ1 = -3.2216)

Quantile Statistics

Minimum -30.5524
5-th Percentile -10.6291
Q1 -1.7423
Median -0.4595
Q3 1.1646
95-th Percentile 2.0532
Maximum 2.3521
Range 32.9044
IQR 2.9069

Descriptive Statistics

Mean -1.4178
Standard Deviation 4.4861
Variance 20.1255
Sum -10058.9969
Skewness -3.2216
Kurtosis 12.6525
Coefficient of Variation -3.1643
  • variable_28 is not normally distributed (p-value 1.7844321060402489e-09)
  • variable_28 has 587 outliers

is_fraud

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 482460
  • The largest value (0.0) is over 2.0 times larger than the second largest value (1.0)

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 14190
  • The top 2 categories (0.0, 1.0) take over 50.0%
  • The largest value (00) is over 2.0 times larger than the second largest value (10)
  • is_fraud has words of constant length

Interactions

Correlations

Missing Values

Report generated with DataPrep

In [ ]: